Two-Phase LMR-RC Tagging for Chinese Word Segmentation
نویسندگان
چکیده
In this paper we present a Two-Phase LMR-RC Tagging scheme to perform Chinese word segmentation. In the Regular Tagging phase, Chinese sentences are processed similar to the original LMR Tagging. Tagged sentences are then passed to the Correctional Tagging phase, in which the sentences are re-tagged using extra information from the first round tagging results. Two training methods, Separated Mode and Integrated Mode, are proposed to construct the models. Experimental results show that our scheme in Integrated Mode performs the best in terms of accuracy, where Separated Mode is more suitable under limited computational resources.
منابع مشابه
Chinese Word Segmentation as LMR Tagging
In this paper we present Chinese word segmentation algorithms based on the socalled LMR tagging. Our LMR taggers are implemented with the Maximum Entropy Markov Model and we then use Transformation-Based Learning to combine the results of the two LMR taggers that scan the input in opposite directions. Our system achieves F-scores of and on the Academia Sinica corpus and the Hong Kong City Unive...
متن کاملUsing Part-of-Speech Reranking to Improve Chinese Word Segmentation
Chinese word segmentation and Part-ofSpeech (POS) tagging have been commonly considered as two separated tasks. In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Random Fields (CRF), using lexical, morphological and semantic features. We propose an approx...
متن کاملComparison of the Impact of Word Segmentation on Name Tagging for Chinese and Japanese
Word Segmentation is usually considered an essential step for many Chinese and Japanese Natural Language Processing tasks, such as name tagging. This paper presents several new observations and analysis on the impact of word segmentation on name tagging; (1). Due to the limitation of current state-of-the-art Chinese word segmentation performance, a character-based name tagger can outperform its...
متن کاملEffective Subsequence-based Tagging for Chinese Word Segmentation
Effective Subsequence-based Tagging for Chinese Word Segmentation Hai Zhao, Chunyu Kit (1. Department of Chinese, Translation and Linguistics, City University of Hong Kong, 83 Tat Avenue, Kowloon, Hong Kong SAR, China) Abstract: The research of automatic Chinese word segmentation has been advancing rapidly in recent years, especially since the First International Chinese Word Segmentation Bakeo...
متن کاملCombining Character-Based and Subsequence-Based Tagging for Chinese Word Segmentation
Chinese word segmentation is the initial step for Chinese information processing. The performance of Chinese word segmentation has been greatly improved by character-based approaches in recent years. This approach treats Chinese word segmentation as a character-wordposition-tagging problem. With the help of powerful sequence tagging model, character-based method quickly rose as a mainstream tec...
متن کامل